First, we load the EXECSAL2.txt into \(R\). Then, we change all variable names to more descriptive ones.
| Variable | Name | Description |
|---|---|---|
| y1 | salary | Salary of executive |
| x1 | experience | Experience(in years) |
| x2 | education | Education (in years) |
| x3 | gender | Gender (1 if male 0 if female) |
| x4 | emps_sump | Number of employees supervised |
| x5 | assets | Corporate assets (in millions of USD) |
| x6 | board_mb | Board member (1 if yes, 0 if no) |
| x7 | age | Age (in years) |
| x8 | profit | Company profits (in millions of USD) |
| x9 | int_res | Has international responsibility (1 if yes, 0 if no) |
| x10 | sales | Company’s total sales (in millions of USD) |
| salary | experience | education | gender | emps_sup | assets | board_mb | age | profit | int_res | sales |
|---|---|---|---|---|---|---|---|---|---|---|
| 11.4436 | 12 | 15 | 1 | 240 | 170 | 1 | 44 | 5 | 0 | 21 |
| 11.7753 | 25 | 14 | 1 | 510 | 160 | 1 | 53 | 9 | 0 | 28 |
| 11.3874 | 20 | 14 | 0 | 370 | 170 | 1 | 56 | 5 | 0 | 26 |
| 11.2172 | 3 | 19 | 1 | 170 | 170 | 1 | 26 | 9 | 0 | 24 |
| 11.6553 | 19 | 12 | 1 | 520 | 150 | 1 | 43 | 7 | 0 | 27 |
| 11.1619 | 14 | 13 | 0 | 420 | 160 | 1 | 53 | 9 | 0 | 27 |
Skim summary statistics
n obs: 100
n variables: 11
── Variable type:factor ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
variable missing complete n n_unique top_counts ordered
board_mb 0 100 100 2 0: 51, 1: 49, NA: 0 FALSE
gender 0 100 100 2 1: 66, 0: 34, NA: 0 FALSE
int_res 0 100 100 2 0: 82, 1: 18, NA: 0 FALSE
── Variable type:integer ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
variable missing complete n mean sd p0 p25 p50 p75
age 0 100 100 42.84 9.07 23 37 42.5 49.25
assets 0 100 100 175.1 15.41 150 160 180 190
education 0 100 100 16.02 2.3 12 14 16 18
emps_sup 0 100 100 340.1 167.18 60 187.5 360 492.5
experience 0 100 100 13.08 7.34 1 7.75 13 20
profit 0 100 100 7.7 1.55 5 6 8 9
sales 0 100 100 24.83 2.74 20 23 25 27
p100 hist
64 ▃▃▇▇▆▆▃▂
200 ▃▇▁▆▇▁▇▃
20 ▇▃▅▅▆▆▆▁
600 ▇▆▅▆▇▆▇▇
26 ▇▃▆▇▃▃▇▅
10 ▂▇▁▇▆▁▇▆
30 ▃▃▃▇▂▃▂▃
── Variable type:numeric ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
variable missing complete n mean sd p0 p25 p50 p75 p100
salary 0 100 100 11.46 0.26 10.66 11.28 11.46 11.61 12.06
hist
▁▁▃▇▇▇▃▂
The minimum value of \(age = 23, assets = 150, education = 12, emps\_sup = 60, experience = 1, profit = 5, sales = 20\), salary = 12.06$.
The maximum value of \(age = 64, assets = 200, education = 20, emps\_sup = 600, experience = 26, profit = 10, sales = 30, salary = 10.66\).
The mean value of \(age = 42.84, assets = 175.1, education = 16.02, emps\_sup = 340.1, experience = 13.08, profit = 7.7, sales = 24.83, salary = 11.46\).
The standard deviation of \(age = 9.07, assets = 15.41, education = 2.3, emps\_sup = 167.18, experience = 7.34, profit = 1.55, sales = 2.74, salary = 0.26\). A higher standard deviation means the data has a larger range of values, therefore, emps_sup has the largest range and salary has the smallest range.
The middle 50% of age ranges from \(37-49.25\), assets from \(160-190\), education from \(14-18\), emps_sup from \(187.5-492.5\), experience from \(7.75-20\), profit from \(6-9\), sales from \(23-27\), and salary from \(11.28-11.61\).
From these histograms we can see that;
# A tibble: 8 x 2
Variable Distribution
<chr> <chr>
1 age Normal
2 assets Random
3 education Mostly Uniform
4 emps_sup Mostly Uniform
5 experience Random
6 profit Random
7 sales Skewed Right
8 salary Skewed Left
This means the number of people with an age between 33.77 and 51.91 is larger than the number of people of ages outside this range.
This means the value of assets is random across the population.
This means the number of people with any number of years of education is evenly distributed.
This means the number of employees supervised is evenly distributed.
This means the number of years of experience is random across the population.
This means the value of profit is random across the population.
This means the company’s total sales is $25 million or below.
This means the executive salary is $11.46 million or above.
Visual Descriptions:
The distribution for males (1) is higher than the distribution for females (0).
The distribution for board members (1) and non-board members (0) are approximately the same. The distribution for non-board members is slightly higher than board members.
The distribution for people that do not have international responsibility (0) is significantly higher than people who do have international responsibility (1).
The distribution for people with 20 years of education is significantly lower than the distribution for people with 12.5 years to less than 20 years of education.
The greatest distribution for age is between 30 years and 45 years.
The distribution of people 60 years of age and older are the lowest compared to the distribution of people between the ages of 30 and 45 years of age.
The mean salary for males (1) is higher than the mean salary for females (0).
There is a positive linear relationship between a person’s experience (in years) and their salary.
There is a positive linear relationship between a person’s age and their salary.
The mean salaries for people with international responsibility (1) and with no international responsibility (0) are approximately even.
Using a linear model with parallel slopes, we can predict an executive’s salary (in millions) based on their experience, education, gender, and assets.
experience, education, gender, and assets all have significant positive correlation to salary that will be included in our linear model.
\[\hat{Salary} = 10.14 + 0.027 \cdot experience + 0.022 \cdot education + 0.003 \cdot assets + 0.185 \cdot 1_{Male}(x)\]
Male executive model:
\[\hat{Salary} = 10.325 + 0.027 \cdot experience + 0.022 \cdot education + 0.003 \cdot assets\]
Female executive model:
\[\hat{Salary} = 10.14 + 0.027 \cdot experience + 0.022 \cdot education + 0.003 \cdot assets\]
In our base model, we could extrapolate that executives have a salary of $10.14 million assuming they have no experience and no education, With every extra year of experience and education, one could expect their salary to increase by $27,000 and $22,000 million respectively. Male executives, on average, make $185,000 more than their female counterparts with similar experience, education, and assets.
Using an interaction model, we can use both the experience and gender variables to see how they interact with each other in terms of salary.
\[\hat{score} = 11 + 0.026 \cdot experience + 0.174 \cdot 1_{Male}(x) + 0.002 \cdot experience \cdot 1_{Male}(x)\]
Female experience model:
\[\hat{score}_F = 11 + 0.026 \cdot experience\]
Male experience model:
\[\hat{score}_M = 11.174 + 0.028 \cdot experience\]
As we can see from the models, male executives have both higher base salaries than women in addition to marginally higher increase in salaries with an increase in experience. However, as evidenced from the graph, this interaction between experience and gender is negligible, as both genders encounter an increase in pay at the same rate.
Intuitively, education and experience are the most important variables in predicting an executive’s salary. On the plot below, experience and education (both in years) are displayed on the floor axes. On the vertical axis, the salary is displayed. The salary is also color-coded, with higher salaries being represented by more ‘hot’ colors. From the graph alone, we can see that more education and more experience is crucial to having a higher salary in an executive position.
One might try to predict company profits based on attributes that make a good executive.
However, there is little to no correlation between any variable and profit. The closest thing is would be using assets, but as evidenced by the plot below, there is no visible accuracy in this model.